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ABSTRACT 



Practicality is the Achilles Heel of alternate assessment in middle school science. This 
five year study of an "early adopter" school explores factors which enable alternate 
assessment to thrive in spite of practical problems. Interviews of ten seventh and 
eighth grade teachers who initiated and sustain new assessment methods indicate that 
in the early years they explored various approaches and were not blamed for problems. 
Since many aspects worked well, the teachers were convinced by their own experience 
of students’ increased ability to understand and explain science concepts. A collegial 
atmosphere and administrative support in time and resources helped to sustain the 
innovation. 
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For the past five years, middle school teachers in a suburban school in the Northeast 
have added alternate assessment to the traditional end-of-year exams in their science classes. 
During this time, staff changes have resulted in the coming and going of three science 
supervisors, two principals, two assistant superintendents, and two superintendents. Only two 
of the original six teachers involved are still teaching in the same positions. In spite of 
personnel changes, the alternate assessment component has remained an important aspect of 
the final assessment, and in fact has become more effective and more efficient to administer. 
Alternate assessment techniques are also being increasingly used throughout the school year. 
This longitudinal case study attempts to clarify the factors which have helped new 
assessments become firmly established in these two grades. 

Across the U.S., reform efforts in science education have included a call to change 
assessment systems so they will more adequately represent the kinds of thinking students do 
in inquiry-based science classes (AAAS, 1993; Bybee et. al, 1990). To date there have been 
few studies which observe the adoption of alternate assessment over a period of years. This 
school was one of the early adopters of alternate assessment and thus provides a five year 
perspective. 



Implementing and Sustaining Change 

The introduction of new modes of assessing students is but one of many reforms in 
teaching science, mathematics and technology now being implemented. Literature on past 
reform efforts and current approaches to improving science literacy frames the experiences of 
one set of teachers in one school. The research and development agenda for alternative 
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assessment proposed by Champagne and Newell (1992) calls for studies to determine how 
teachers respond to alternative assessment as an innovation and how implementation of the 
innovation influences the teachers’ classroom practices and understanding of how students 
learn. 

Hargreaves (1996) points to the need for studies of the context in which teachers 
respond to innovation. Teachers’ voices are situated in the school environment they have 
experienced. Some contexts are supportive and call forth good teaching while other contexts 
create resistance to innovation. In looking at the reasons for the five year growth trajectory 
of alternative assessment in the school being studied, we must ask what made this growth 
possible. 

After studying twenty years of innovation in science education. Hall (1992) concludes 

that effective reforms have seen change as a process, not an event. Hall also believes that 

change is most effective when all players, from policy makers to practitioners work together 

on a level playing field to serve the best interests of students. 

The Progressive Movement and the NSF sponsored science curriculum of the 1950’s 

and 1960’s provide insight into the process of long-term innovation in schools. Elmore 

(1996) uses these historical movements to develop a model of the ways teachers engage in 

intentional learning about new ways to teach. 

While knowledge is not deep on the subject, the following seems plausible; teachers 
are more likely to learn from direct observation of their own practice and trial and 
error in their own classrooms than they are from abstract descriptions of new teaching; 
changing teaching practice, even for committed teachers, takes a long time, and several 
cycles of trial and error; teachers have to feel that there is some compelling reason for 
them to practice differently, with the best direct evidence being that students learn 
better; and teachers need feedback from sources they trust about whether students are 
actually learning what they are taught (p.24) 
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Essentially the same conclusions are drawn by Sparks and Loucks-Horsley (1990) following a 
review of the literature on effective staff development. They add two points: support for 
collegiaUty and administrators who "vigorously support" teachers attempts to adopt new 
practices. 

Not every teacher is willing to accept innovation. Johnson, Johnson and Holubec 
(1988) state that teachers must be at least somewhat dissatisfied with the current practice and 
be convinced that the new practice will have a desired effect on students. They must also be 
convinced of the feasibility of the innovation. If these beliefs are not in place, the teachers 
wiU believe the cost of changing is too high and will resist the innovation. 

Johnston et. al. (1990) reflect on ten years of innovation in the Pittsburgh schools and 
conclude that sustained support was significant. Change takes time. The amount of time 
needed can frustrate everyone who desires quick solutions that will gamer political support. 

In summary, innovation has been sustained in schools where there is an atmosphere of 
collegiaUty and innovation, staff support which recognizes teachers as professionals, and a 
general willingness to spend time and money on the new practice and to accept temporary 
faUures. Perhaps most important, innovation is sustained when teachers are convinced from 
their own experience that their students are learning more effectively. 

Sample and Site Information 

Over the five years of this study, I interviewed five seventh-grade life sciences 
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teachers, and five eighth-grade physical sciences teachers\ The school is a medium-sized 
suburban school with a predominately white middle-class population and a strong tradition of 
effective science education. Early in the Spring 1991 semester, the middle school science 
teachers were asked to revise the end-of-year activities for their students and were encouraged 
to include alternate assessment in this revision. One of the seventh grade teachers had 
attended a three day workshop to learn the essentials of this approach, so she provided 
conceptual information to the others. One of the eighth grade teachers has long been an 
advocate of hands-on teaching and testing. She is the ringleader for the project. 

At the two grade levels, various traditional and alternate assessments have been used 
during the past five years. In general, one-third of the year-end exam is multiple-choice and 
two-thirds of the questions use alternative modes such as group tasks, open-ended laboratory 
experiences, extended essays based on data collected by the students, and concept-mapping. 
Group presentations to the rest of the class are often used as evaluation tools for unit work. 

Methods 

There are three sources of data for this study: interviews, field notes and document 
analysis. I attended teacher planning meetings and test days in years one, two, and five and 
kept field notes. I interviewed the teachers in each of those years and spoke informally to 
them in years three and four. Copies of the assessments themselves, and an article written by 
one of the teachers are the third source of data. Documents such as these track changes made 

^ To honor the teachers’ work, I asked each teacher if they preferred that I use their 
own name or a pseudonym. Thus, real names and pseudonyms are intermingled. 
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over the years. 

I transcribed the interviews and field notes and coded them using the constant 
comparative method (Strauss &. Corbin, 1990). Using the codes, I constructed two 
conceptually-clustered matrices (Miles &. Huberman, 1984) (Figures la,lb,2a,2b). On such a 
matrix, three kinds of entries are made. A label captures in a single word each informants’ 
characteristic response. An explanation elaborates on the label where necessary. A quotation 
captures the essence of a person’s thought and adds richness to the picture. I added a fourth 
element, a level of use label, drawn from the work of Hall (1992). Hall defines eight levels 
of use of an innovation from non-use to renewal (Figure 3). I assigned a level to each of the 
teachers I interviewed. These levels enhanced comparison of the matrices between the first 
and fifth years. 

After constructing the matrix, I was able to get an overview of a large amount of data. 
Patterns and directions of movement emerge. By reading rows, I could get a picture of an 
individual’s responses to several issues. By reading columns, it is possible to compare 
teachers’ responses. 

Results and Discussion 

A. Changes over five years 

The conceptually-clustered matrices, field notes and copies of the assessment questions 
all augment the impression of gradual growth over the five years. New teachers who were 
hired were selected, in part, because of a willingness to experiment. Each of them, in the 
interviews, credits the experienced teachers with showing them how to conduct effective 
assessments and supporting their early attempts. Thus, the matrices for 1995 (Figure 2) show 
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that all six teachers are at the top three levels of integration. Although they still struggle with 
logistics, all are convinced that the benefits in increased student learning are worth the cost. 

Although multiple choice exams continue to be given, questions in year five called for 
more higher level thinking. For example, seventh graders are asked to construct their own 
taxonomic key and then use it to answer multiple choice questions. The eighth grade 
assessment has a laboratory practical component with seven stations and materials that the 
students must manipulate (CappieUo, 1994). Over the five years, the activities at the stations 
have been changed so that it is more efficient to set up and administer. For example, in the 
early years, batteries burned out or things broke before all students had a chance to work with 
them. A question dealing with the differing densities of od, water and alcohol created a 
gloppy mess after the first few rotations. Experience has helped the eighth grade teachers 
avoid such problems. 

Another way in which the eight grade assessment has changed is that more of the 
laboratory questions are open-ended. For example, students are asked to record the masses 
of several objects and the forces needed to drag them up an inclined plane. They must design 
a data table to record their results and then explain the results in terms of mass, friction and 
force. Another station provides students with a variety of toys, watches and rulers and asks 
them to devise and solve their own problem (CappieUo, 1994). 

The one assessment that has changed the least is the seventh grade group-work exam. 
This is unfortunate because it has never worked weU. Groups of students are presented with 
a scenario of scientists trying to decide if acid rain effects the growth of plants. Each group 
is given three plants. A, B, and C. A has a large amount of acid in the soil, B has a small 
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amount and C is the control. Students are to measure stems and leaves, create a data table 

and bar graph and then to explain their response to the question, "Did the acid make a 

difference"? The problem is that the pea or bean seedlings need to be started several weeks 

beforehand. Either the plants get a late start, or they dry up over the weekend, or they 

collapse after the first group of seventh graders measures them. The teachers like the group- 

work aspect of this problem, but they have never managed the logistics of growing the plants. 

Each year they plan carefully to avoid the problem of the preceding year, only to encounter a 

different problem. Ms. Wood explained that almost all available examples of alternate 

assessment are for physical sciences. And none of the seventh grade teachers has had the 

time to step back and design a different group-work problem to replace this one. 

B. Factors that discourage sustained adoption 

In year five, just as in year one, the teachers were concerned about the practicality of 

alternate assessment. These concerns focused on three issues; time for preparation, 

administration and scoring; space to set up the assessment; and adequate resources to conduct 

the tests they designed. Of these, time was the predominant concern^. 

It’s so hectic in terms of teaching five classes, monitoring a study hall, meeting with 
guidance, parent conferences etc. ...You can’t just stop and say, okay, now I’m going to 
take time to figure this out. [Ms. Quackenbush, 11/20/91] 

The toughest part is for us to get together to meet and plan it.... That’s probably the 
biggest problem. [Ms. Cappiello, 11/25/91] 

Planning the assessment is only one aspect of the time needed. Getting out all the 
equipment for the hands-on activity and setting it up and then putting it away again takes 



^ Quotations have been minimally edited for readibUity. 
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uncounted hours. Janitors, school staff and I all lent a hand. Without this extra help, the 

assessment would have been impossible. After the exam, scoring took time; 

The correcting took over eighteen hours. Some of it was done while the students were 
being entertained by the other teachers at a field day at the town park and the rest of it 
was, as usual, at home, at night. [Ms. Quackenbush, 11/20/91] 

Patience is another aspect of time that is essential for effective use of alternate 

assessment. The only way to find out how an assessment works is to give it to students and 

see how they respond. It may take several iterations to get the question to work the way the 

designer intends. 

You have to try it with the kids. They may come up with fantastic insights.. .or they 
may interpret it totally differently than what you intended. [Ms. Wood, 3/2/96] 

Administrators and policy makers who want quick, visible results are sure to be frustrated by 

the amount of time it takes to get the results they envision. 

Space is also an issue. In this school, there is an exam week in which classes are 

suspended. Students only come for the hours in which their own tests are scheduled. This 

means that there are empty rooms that the alternate assessment can spread into. The 

gymnasium and extra classrooms were used to accommodate this need. It is much more 

difficult to find space for ongoing assessments throughout the year; 

I have so many kids in this classroom. All of the desks are filled. I have such limited 
space. I tried the first test to do lab stations. It was a fiasco, completely! There was 
no room. ..It was hazardous... I was crushed. [Ms. Poodiack, 11/20/91] 

Alternate assessment is resource intensive. Someone has to purchase requisite 
quantities of tin foil, paper cups, eye droppers, mass balances, graph paper etc. etc. Money 
must be in the budget to pay for the materials, there must be a place to store them; there must 
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be a plan to replace expedibles and to repair classroom equipment. Some districts have 
resource centers that supply kits to teachers, other districts have an account at the local 
hardware store. Whatever the solution, the resource issue is a major practical problem which 
must be addressed. 

My field notes for 1992 reveal another practical concern. To work effectively, wide- 
scale alternate assessment requites a great deal of good will within the school staff. In this 
school, the gym teachers vacate the gym on assessment day so the laboratory stations can be 
set up, janitors help move furniture and have more to clean up than usual, and other teachers 
pitch in to help in various ways. In 1992, the school was having a union dispute and 
everyone was on work-to-rule. Good will was not present in the school, and the fledgling 
alternate assessment program almost didn’t survive. Three teachers told me privately that 
they were so exasperated with logistical problems, they’d never do alternate assessment again. 
Fortunately, the next year good will was re-established. Intangible issues of tone and mood 
affect a school’s ability to sustain innovation and need to be taken seriously. 

C. Factors which encourage sustained adoption 

By fifth year, every teacher interviewed had no doubt that the benefits of alternate 
assessment outweighed the practical problems. Four factors were consistently mentioned to 
explain this belief 

Dissatisfaction with the status quo. Teachers in this sample expressed dissatisfaction 
with conventional means of testing in two ways. Several spoke of their frustration when 
students memorize material only for the duration of the testing period. Others implied their 




dissatisfaction by speaking of their vision of how things ought to be: 



If I’m going to fill their heads with a lot of memorized facts, how much are they 
going to remember? A lot of that stuff I don’t even remember. My goal is to give 
them the concepts, ideas.... The detail - give them the opportunity to look that up in 
references. [Ms. Cappiello, 11/25/91] 



You can only give them recall tests and assault their self esteem so often before they 
just kind of give up and don’t even want to try. So I think some of these kids will be 
pleasantly surprised when they get their tests back and feel good about themselves. 
[Ms. Quackenbush, 11/20/91] 



The teacher who had the most doubts about alternate assessment was the one who 

appeared least dissatisfied with conventional testing: 

I think the bottom line is basic knowledge and being able to apply it. Some of the 
problems — you did have to apply some knowledge, but I think a lot of the questions 
were very open-ended and there wasn’t much direction and I think there’s a lot of 
direction in science. So I just hope that people don’t try to make it 100% alternative 
type of situation. [Mr. Patton, 6/18/71] 



Belief that alternate assessment will help student learning. Alternate assessment 

provides a great deal of intrinsic satisfaction to teachers who see students respond positively: 

But I would say, for the majority of the kids, it’s good. They need to be trained, this 
is new for them. They need to learn how to do this. They need to learn what their 
responsibilities are as part of the group. [Ms. Feldman, 11/12/91] 

Still, we can’t forget that there’s got to be some joy and some - it’s got to be a fluid 
thing, learning. I think one of our big jobs in the middle school it to get kids excited 
about learning. If we turn them off now, that’s too bad. [Ms. Bosworth, 12/10/91] 

Each of the teachers independently commented that one of the major benefits of 

alternate assessment is that students with a wider variety of abilities and learning styles can 
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I was afraid that all my enriched kids would get 100 and everybody else would fail... 
But I had kids in the classes that were not enriched that... a couple got hundreds and 
there was a nice spread... It could be real nice for them to realize, hey, you weren’t in 
the enriched class but you did better than some of the kids who were and you took the 
same test. [Mrs. Quackenbush, 11/20/91] 

When asked during an interview what recommendations they would have for schools 
considering adopting this mode of teaching, several teachers mentioned the benefits of hearing 
other teachers’ enthusiasm: 

I think when you get into a program you need to teach your staff — to give your staff 
a chance to go out to schools where it’s being used, where you can see how successful 
it is, where you can hear all the teachers’ excitement and just feel it. Then you will 
buy into it because you’re a teacher, because you care about kids. [Ms. Brown, 
12/10/91] 



Collegial school climate. The matrices offer only a sample - the original interviews 

are infused with appreciation for the ability to work together as colleagues. 

All the teachers on my team are dynamic. They are constantly trying to find better 
ways to make sure kids reaUy learn. Since our rooms are close together, we 
frequently chat in the halls. I find that very helpful. [Ms. Wood, 4/3/96] 

The eighth grade teachers negotiated with the administration to arrange their schedules so 
they are all free at the same hour, thus enabling common planning time. The seventh grade 
teachers would like to make a similar arrangement. 

In the first two years, Mr. Patton, who was most dubious about the benefits of 
alternate assessment expressed his concern in a way that was positive and that could be heard 
by the other teachers. They took his objections into account and devised a test which had a 
stronger academic content. Having had his major fears allayed, Mr. Patton participated 
willingly in the new assessment. By taking his legitimate concerns seriously, the others made 
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him into an ally rather than a resistor. After Mr. Patton moved to the high school in the same 
district, Ms. Cappiello said, "I really miss him. He kept me honest. He kept me focused on 
science." 

Flexibility and acceptance of teachers’ professional decisions. It is a commonplace in 
psychology that before people can take a risk, they must be assured that the costs of failure 
are not too high. When the administrators established the goal of instituting alternate 
assessment, but left the design decisions to the teachers, they enabled the teachers to change 
plans as needed to make the assessments go smoothly. 

In the first year, the teachers did some interesting things to create the necessary sense 
of security. The assessment was counted as a unit test rather than as a final exam. That way, 
if there was something seriously wrong with the assessment itself, the students would not be 
overly penalized for a poor score. 

A pattern, initiated at the beginning, and sustained over the years is that there is a 
common core of questions, but each teacher adds a few questions that are specific to material 
that they have emphasized. The tailor-made test allows for teacher individuality and frees 
them from "teaching to the test". [Field notes, 6/11/91, 3/6/96] 

• The hands-on assessment taken by the eighth graders has ten questions but only eight are 
counted. The lowest two scores are dropped. During the exam, some equipment invariably 
creates problems. Scores for those stations can be discarded. In the seventh grade, 
cooperative groups work together on the exercise. They are allowed to ask their teachers 
process questions during the test. This assures the students and the teachers that there would 
be no failures caused by lack of understanding what is expected of them. These design 
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decisions by the teachers do a great deal to aUay anxieties about this new way of evaluating 
learning. [Field notes, 6/11/91, 3/6/96] 

Flexibility and patience on the part of the supervisors and the principal are also 
credited by the teachers as important factor enabling them to take risks. 

If you fail, the administrators don’t blame you. They say, "This is being tried, and if 

it doesn’t work 100%, that’s to be expected." [Ms. Bosworth, 3/6/96] 

Another example of flexibility is that the seventh and eighth grade teachers are encouraged to 
use very different styles of alternate assessment. These are in tune with developmental 
differences in the students and with the different nature of the content in life and physical 
sciences. 

Conclusion 

Elmore (1996) says his list of factors enabling teachers to leam about new ways of 
teaching "seems plausible". Elmore’s suggestions are and those of Sparks and Loucks- 
Horsley (1990) are supported by the experience of the ten teachers in this study. Over five 
years, the teachers learned through trial and error in their own classrooms and through rich 
collegial interactions. Administrators allowed time for adoption and vigorous support for the 
innovation. The reason to adopt alternate assessment which all the teachers find most 
compelling is the experience that their students are becoming more articulate in conveying a 
deep understanding of the science being taught. 

Practicality is the Achilles’ heel of alternate assessment. "Trying to fix the airplane 
while it’s in the air" is a good metaphor for the problems. Introducing effective new 
assessment modes will take large quantities of the two commodities which are scarcest in 
schools: time and money for staff support. Exhortation and mandates will not make alternate 
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assessment spring full-blown into American schools. Teachers who are committed to better 
learning outcomes for their students are intrigued by the potential of these new techniques. 
But judging from the experiences of these teachers, resolving practical issues is the key to 
long term implementation of alternate assessment. 

Recommendations 

As more states consider implementing alternate assessment on a large scale, two ideas 
from this study are worth considering. Because it takes so much time to develop and revise 
alternate approaches, there should be some central repository of "tried and true" assessments. 
Perhaps the National Science Teachers Association or some other group could establish a web 
site. There is a particular need for successful approaches to assessing the life sciences. 

When teachers do create an effective question or task, they could post it on the web site. 

The conventional way of disseminating new approaches has been to hold workshops 
for teachers. In view of the importance of appropriate administrative support, it may be 
important to plan wide-scale workshops designed especially for administrators in which they 
share successful strategies for supporting the teachers in their schools who are implementing 
alternate assessment. 
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LEVEL OF USE 


BEHAVIORAL INDICES OF LEVEL 


VI 


RENEWAL 


THE USER IS SEEKING MORE EFFECTIVE 
ALTERNATIVES TO THE ESTABLISHED USE OF 
THE INNOVATION. 


V 


INTEGRATION 


THE USER IS MAKING DELIBERATE EFFORTS 
TO CORRDINATE WITH OTHERS IN USING THE 
INNOVATION. 


rvB 


REFINEMENT 


THE USER IS MAKING CHANGES TO INCREASE 
OUTCOMES. 


IVA 


ROUTINE 


THE USER IS MAKING FEW OR NO CHANGES 
AND HAS AN ESTABLISHED PATTERN OF USE. 


m 


MECHANICAL 

USE 


THE USER IS USING THE INNOVATION IN A 
POORLY COORDINATED MANNER AND IS 
MAKING USER-ORCENTED CHANGES. 


n 


PREPARATION 


THE PERSON IS PREPARING TO USE THE 
INNOVATION. 


I 


ORIENTATION 


THE PERSON IS SEEKING OUT 
INFORMATION ABOUT THE 
INNOVATION. 


0 


NONUSE 


NO ACTION IS BEING TAKEN WITH RESPECT TO 
THE INNOVATION. 



Figure 3: Levels of use of the innovation: Typical behaviors (Hall, 1992) 
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